-
Notifications
You must be signed in to change notification settings - Fork 3.8k
CASSANDRA-18831 Add JDK21 support #3696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
|
|
|
FYI if you want to retain a max tenuring threshold of 1, you can specify -XX:ZTenuringThreshold=1 with generational ZGC. But honestly it will probably figure things out automatically anyway. |
ekaterinadimitrova2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a full review but high-level remarks.
Most of all:
- I am not sure we should change config parameters based on the assumption people will use directly JDK21 on the next major. Check my comments. Also, probably we need to bring to the ML if we want to move to ZGC. Maybe you can share some test results from your practice with it and benefits.
- I would not add 21 to supported versions in build.xml until we start running JDK21 CI.
We have an option that allows people to experiment with 21: CASSANDRA_JDK_UNSUPPORTED - There is one add-export that is incorrect and can be removed.
755dd15 to
5f85b68
Compare
ekaterinadimitrova2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some general comments for things that I see. No big blockers on my end.
Leaving to @michaelsembwever the full review as my understanding is he already did that and I have full trust on you too trusted PMC members. :-)
I am fine with the new ZGC added as long as we bring it to the mailing list and it is not yet advertised probably as production ready? I believe we already agreed on that but just leaving here for completeness and awareness.
Just a few notable things to be addressed:
some tests' changes were not reverted after we agreed not to change defaults for config - this breaks config tests.
For some unknown to me reason I do not find cassandra listed here - https://ci-builds.apache.org/ and I couldn't verify the rest of the test failures.
I am a bit confused as some tests still have failures related to jamm despite the settings being tweaked to workaround some mentioned parameter. Will those be solved with the jamm PR being merged? They are expected? I am sure I missed something here but I don't know what :-)
-Dnet.bytebuddy.experimental - I still see it in the jvm config file and I thought we figured it is not needed? Maybe I misunderstood something?
I am not familiar with the latest CI scripts, so I definitely defer that to @michaelsembwever .
Also, simulator code is not something I know. The rest I checked for immediate obvious issues. Hope that feedback helps here. Thank you for your work!
NEWS.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds great, do we recommend this in prod already? I don't think I am the right person to review the selected settings and say they are good for prod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker for this ticket and we do not have test coverage etc, neither I think it is a blocker but as a future reminder - I recommend adding simple unit tests to the DatabaseDescriptorTest.
test/unit/org/apache/cassandra/config/LoadOldYAMLBackwardCompatibilityTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question, about the change of default
|
k, signposting intent here @ekaterinadimitrova2 and @michaelsembwever:
I'm going to set this PR "on deck" for a week or two to address another priority, then will work through all this in a closed loop and surface if I get blocked on or need a 2nd pair of eyes on anything. That way we can break out of the long-async cycles we've been having on review / integration on this branch and hopefully push it across the line. |
11f6c88 to
4e9ba80
Compare
df3eb40 to
54e39a9
Compare
d032dca to
97f4ce1
Compare
|
Hello, Appreciate all the work the team is doing on Cassandra! |
The hold up is CI validation. I have 1 pipeline left (python upgrade dtests) to get working on our internal CI infrastructure then it should only be a couple of weeks to burn down test failures. The code as written here is functionally identical to what we're running internally on a fork on a very large fleet w/all CI passing so if you wanted to create a build from this patch to tinker with, you should be in a very solid place. I'm going to be traveling next week and we have the community over code conference coming up in a month so it's not an ideal season for keeping things moving along but I certainly haven't forgotten about this work, it just got shuffled a bit in priorities but I'm almost dug out from under things. |
97f4ce1 to
d74591a
Compare
|
Without such flags, a piece of software is highly likely to work, unchanged, on future JDK releases for a long time, enjoying performance and observability improvements for free. But every single one of them that is removed can contribute to the project's stability and portability, so I would recommend taking the time to make sure they're really needed. |
https://github.com/apache/cassandra/blob/trunk/conf/jvm17-server.options#L69 |
Ah. In that case, the code is living on borrowed time and has just been lucky. Every release increases the chance of things breaking. Remember that the reason for encapsulation of JDK internals (or one of the several reasons) is to give the JDK more freedom to change internals more liberally, and we've been taking advantage of the freedom. The pace of changes to internals is expected to continue growing. I had a look at a couple of uses of internal access, and they now have standard replacements (albeit in JDK 22 and on). |
|
@pron - you are not wrong. Also, I would like to mention that on JDK17, we reduced the number of add-opens and add-exports, especially with the update of the jamm library and other code changes during the upgrade. We will be happy to continue improving things when there is a chance.
That is great. Please feel free to open tickets with your findings. It would be even better if you had patches, but even just tickets would be great. Thank you for your input here. Highly appreciated |
3e000d9 to
bfe0afc
Compare
2cbddc1 to
d10fb89
Compare
Patch by Josh McKenzie; reviewed by TBD for CASSANDRA-18831 Co-authored-by: Aleksey Yeschenko Co-authored-by: Achilles Benetopoulos Co-authored-by: Mick Semb Wever
- Unified log vs. warn relationship checking under a single method in DatabaseDescriptor
…s to prevent OOM failure stalls from forked JVM tests
8eb0db2 to
0e21912
Compare
| ws("workspace/${JOB_NAME}/${BUILD_NUMBER}/${cell.step}/${cell.arch}/jdk-${cell.jdk}/python-${cell.python}") { | ||
| fetchSource(cell.step, cell.arch, cell.jdk) | ||
| fetchDockerImages(['ubuntu2004_test']) | ||
| fetchDockerImages(['ubuntu-test']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 346 also needs a fix !! @jmckenzie-dev
fetchDockerImages("redhat" == cell.step ? ['almalinux-build'] : ['debian-build'])
| # Don't let ZGC return unclaimed memory to the OS by setting min and max == | ||
| # -Xms12G </string> | ||
| # -Xmx12G </string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this redundant ? (it being done already in cassandra-env.sh)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. In cassandra-env.sh we have the following:
# We only set -Xms and -Xmx if they were not defined on jvm-server.options file
# If defined, both Xmx and Xms should be defined together.
if [ $DEFINED_XMX -ne 0 ] && [ $DEFINED_XMS -ne 0 ]; then
JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"
JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}"
elif [ $DEFINED_XMX -ne 0 ] || [ $DEFINED_XMS -ne 0 ]; then
echo "Please set or unset -Xmx and -Xms flags in pairs on jvm-server.options file."
exit 1
fi
This implies to me that we expect to have the values exist in both places and one takes precedent.
| # Young generation size is automatically calculated by cassandra-env | ||
| # based on this formula: min(100 * num_cores, 1/4 * heap size) | ||
| # | ||
| # The main trade-off for the young generation is that the larger it | ||
| # is, the longer GC pause times will be. The shorter it is, the more | ||
| # expensive GC will be (usually). | ||
| # | ||
| # It is not recommended to set the young generation size if using the | ||
| # G1 GC, since that will override the target pause-time goal. | ||
| # More info: http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html | ||
| # | ||
| # The example below assumes a modern 8-core+ machine for decent | ||
| # times. If in doubt, and if you do not particularly want to tweak, go | ||
| # 100 MB per physical CPU core. | ||
| #-Xmn800M |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also exists here: https://github.com/jmckenzie-dev/cassandra/blob/1564529f5b7dce3a20f10dfd0add2f146b3991de/conf/jvm11-server.options#L53-L58
where it is better placed, being only relevant to CMS.
GenZGC does not have the young/old distinction, and G1 uses -XX:G1NewSizePercent=50 here: https://github.com/jmckenzie-dev/cassandra/blob/1564529f5b7dce3a20f10dfd0add2f146b3991de/conf/jvm11-server.options#L70C1-L70C24
i think what's highlighted here can simply be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably here to provide a warning that setting it in the G1 case is ill-advised. -Xmn is noop for ZGC but G1 users could trip across this.
| throw new IllegalArgumentException("Threshold value for gc_warn_threshold must be greater than or equal to 0"); | ||
|
|
||
| if (threshold > Integer.MAX_VALUE) | ||
| throw new IllegalArgumentException("Threshold must be less than Integer.MAX_VALUE"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the parameter type should be long but still be under Integer.MAX_VALUE it (IMHO) deserves a comment as to why, as it looks odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The int to long conversion predates this patch; I just kind of "When in Rome'ed" it.
DatabaseDescriptor.setGCLogThreshold((int) DatabaseDescriptor.getGCLogThreshold());
DatabaseDescriptor.setGCWarnThreshold((int) DatabaseDescriptor.getGCWarnThreshold());
I could dig around a bit to try and figure out why these return a long - comes from JJirsa in 2016. My guess is that we didn't want to change the type but did want to sanity check and constrain the values to permissible ranges, but I'm not sure tbh.
| if (threshold < 0) | ||
| throw new IllegalArgumentException("Threshold value for gc_warn_zgc_threshold must be greater than or equal to 0"); | ||
|
|
||
| if (threshold > Integer.MAX_VALUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
michaelsembwever
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
complete review done. a few comments to address.
Add JDK21 support
Patch by jmckenzie; reviewed by TBD for CASSANDRA-18831